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Abstract 

Motivated by a certain molecular reconstruction methodology in cryo-electron 
microscopy, we consider the problem of solving a linear system with two un¬ 
known orthogonal matrices, which is a generalization of the well-known orthog¬ 
onal Procrustes problem. We propose an algorithm based on a semi-definite 
programming (SDP) relaxation, and give a theoretical guarantee for its perfor¬ 
mance. Both theoretically and empirically, the proposed algorithm performs 
better than the naive approach of solving the linear system directly without the 
orthogonal constraints. We also consider the generalization to linear systems 
with more than two unknown orthogonal matrices. 
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1. Introduction 

In this paper, we consider the following problem: given known matrices 
Xi,X 2 G and unknown orthogonal matrices Vi, V 2 G 0(D), recover Vi 

and V 2 from XsGR^^^ defined by 


Ag =XiFi+X2y2. (1) 

A naive approach would be solving Q while dropping the constraints of or¬ 
thogonality on Vi and V 2 . This linear system has ND linear constraints and 
2D^ unknown variables, therefore, this approach can recover Vi and V 2 when 
N > 2D. The question is, can we develop an algorithm that takes the con¬ 
straints of orthogonality into consideration, so that it is able to recover Vi and 
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V 2 when JV < 2D, and more stably when the observation X 3 is contaminated 
by noise? 

The associated least squares problem 


min \\XiVi + X 2 V 2 

Vi,V2GO(D) 


‘■3||_F 


( 2 ) 


can be considered as a generalization of the well-known orthogonal Procrustes 
problem [T]: 


min 

vgO(d) 


lIXiV 


'-2||F) 


( 3 ) 


with the main difference being that the minimization in ([^ is over two orthog¬ 
onal matrices instead of just one as in Although the orthogonal Procrustes 
problem has a closed form solution using the singular value decomposition, 
problem Q does not enjoy this property. 

Still, H can be reformulated so that it belongs to a wider class of prob¬ 
lems called the little Grothendieck problem [2], which again belongs to QO-OC 
(Quadratic Optimization under Orthogonality Constraints) considered by Ne- 
mirovski |3]. QO-OCs have been well studied and include many important 
problems as special cases, such as Max-Cut [1] and generalized orthogonal Pro¬ 
crustes [3E1[7] 


min 

Vi,...,V„GO(D) 


1 < 2, J ^ 


XjV,\\ 


2 


which has applications to areas such as psychometrics, image and shape analysis 
and biometric identification. 

The non-commutative little Grothendieck problem [H] is dehned by: 


min 

Vi,...,v„eO(D) 


E tr(a,y.y/). 


( 4 ) 


Problem ([^ can be considered as a special case of Q with n = 3. The argument 
is as follows. For convenience, we homogenize Q by introducing a slack unitary 
variable V 3 S 0{D) and consider the augmented linear system 


XiVi + X2V2 + X3V3 = 0 (5) 

Clearly, if {Vi,V 2 ,V 3 ) is a solution to ([^, then the pair (—ViVlj^, — 

is a solution to the original linear system Q. The least squares formulation 

corresponding to ([^ is 

min \\X 3 Vi+X 2 V 2 + X 3 V 3 \\l. ( 6 ) 

Vi,V2.V3GO(F)) 


Let C € Hermitian matrix with the (i,j)—th D x D block given 

by Cij = Xj Xj. The least squares problem ^ is equivalent to 

3 


min 

Vi,V2,VsGOiD) 




( 7 ) 


which is the little Grothendieck problem Q with n = 3. 
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1.1. Motivation 


Our problem arises naturally in single particle reconstruction (SPR) from 
cryo-electron microscopy (EM), where the goal is to determine the 3D struc¬ 
ture of a macromolecule complex from 2D projection images of identical, but 
randomly oriented, copies of the macromolecule. Zvi Kam [9j showed that the 
spherical harmonic expansion coefficients of the Fourier transform of the 3D 
molecule, when arranged as matrices, can be estimated from 2D projection 
images up to an orthogonal matrix (for each degree of spherical harmonics). 
Based on this observation, Bhamre et al. nni recently proposed “Orthogonal 
Replacement” (OR), an orthogonal matrix retrieval procedure in which cryo- 
EM projection images are available for two unknown structures and 
whose difference is known. It follows from Kam’s theory that we 

are given the spherical harmonic expansion coefficients of and up to 
an orthogonal matrix, and their difference. Then the problem of recovering 
the spherical harmonic expansion coefficients of and is reduced to the 
mathematical problem Q. If Q can be solved for smaller N, then we can re¬ 
construct p^^'> and with higher resolution. The cryo-EM application serves 
as the main motivation of this paper. We refer the reader to m for further 
details regarding the specific application to cryo-EM. 

2. Algorithm and Main result 

The little Grothendieck problem and QO-OCs are generally intractable, for 
example, it is well-known that the Max-Cut Problem is NP-hard. Many ap¬ 
proximation algorithms have been proposed and analyzed n n m m 1 US], 
and the principle of these algorithms is to apply a semi-definite programming 
(SDP) relaxation followed by a rounding procedure. The SDP can be solved 
in polynomial time (for any finite precision). Based on the same principle, we 
relax the problem Q to an SDP as follows. 

Let H G ]j3DxTO ^ Hermitian matrix with the (i,j)—th D x D block 
given by iTy = , that is. 



Vi 

V2 

Vs 


H = 


Then Q is equivalent to 


tr(CiT) 


mm 

HpOMii=I,Ta.nk(H)=D 


where H 0 denotes that H is a positive semidefinite matrix. The only 
constraint which is non-convex is the rank constraint. Dropping it leads to the 
following SDP: 


min tr(CiT), subject to iT ^ 0 and Ha = I. 


( 8 ) 
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If the solution satisfies rank(f/') = D, then Vi, V2 and V3 are extracted by 
applying decomposition to H as follows. Let H — UU^, where 


/ [/i \ 

U =\ U2 G and U, G R^'^^ 

V u, ) 

then Vi = Ui, l<i <3 would be a solution. 

Notice that the solution to ([^ is not unique: if {Ui, U2, U3) satisfies ([^, then 
for any U G 0 {D), the triplet {UiU, U2U, U3U) satisfies ([^ as well. Although 
the solution to ([^ is not unique, the solution to the original problem ([^ is 
uniquely given by (-{71173^,-1721/7)■ 

When rank(i 7 ) > D, then there does not exist U G R^^^^ such that 
H = UU^ and the linear system Q does not have a solution. However, we 
could employ the rounding procedure described in m- Let H = UU^, where 

{7 = ^ {72 j G and G R^^^^ , 

then we generate approximate solutions by Vi = f{—UiUj ) and V2 = f{—U2Uj ), 
where / is a rounding procedure to the nearest orthogonal matrix as follows. For 
any Z G R^^^ with SVD decomposition Z = Uz^zVj, f{Z) = UzV^ = 
Z(ZTZ )-5 [H]. 

2 . 1 . Main results 

The main contribution of this paper is a particular theoretical guarantee for 
the SDP approach to return a solution of rank D and recover Vi and V2 exactly. 
We start with a theorem that controls the lower bound of the objective function 
in Q. Throughout the paper, for any d-dimensional subspace L in K^, Pl is a 
projector of size D x d to the subspace. 

Theorem 2.1. For generic Xi^ X2 G R^^^ with N > D + 1, X3 = —X1 — X2, 
k > D, and 

{7 = ^ C/a j G : Ui G R^^^,UiUj = U 2 UJ = U 3 UJ = | ’ 

then for any U € U, 

||XC/||f >c(Xi,X 2 )||C/TPl 7 | 7 , ( 9 ) 

where X = (Xi, X2, X3) G 

Li = {a: G : x = {v, v, v) for some v G K'^}, 

and c{Xi,X2) is a constant depending on Xi and X2 and it is positive for 
generic Xi and X2. 
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2.1 


and ||XC/|||. = iv{CH), this paper proves that when 


Based on Theorem 

N > D + 1, the SDP method recovers the orthogonal matrices for generic 
cases, i.e., the property holds for {Xi,X 2 ) that lies in a dense open subset of 


rxD 


fxD 


X . This is formally stated next. Its part (b) shows that the SDP 

method is stable to noise. 


Theorem 2.2. (a) For generic Xi,X 2 € with N > D F 1, the SDP 

method recovers Vi and V 2 exactly. 

(b) Under the assumptions in (a), and suppose that the input matrices of the 
SDP method are Xi such that ||Xi — ^i||F < e for 1 < i < 3, then the SDP 
method recovers approximately in the sense that the error between the 

recovered orthogonal matrix Vi and the true orthogonal matrix Vi, HVi — 
is bounded above by C\fe for some C that does not depend on e. 


The result (a) shows that the SDP method successfully recovers the orthog¬ 
onal matrices as long as N > D F 1, compared with the stringent requirement 
N > 2D for the naive least squares approach. The condition N > D-l-1 is nearly 
optimal. In Q, there are ND constraints and D{D — 1) variables. Hence, it is 
impossible to recover Vi and V 2 when N < D — 1. 

The result (b) shows that the SDP method is stable to noise in the input 
matrices. We remark that it might be possible to improve the stability analysis: 
While the current analysis gives an error of 0(-\/e), the empirical performance 
usually has an error of 0(e), as shown in Tableof Section]^ 

We also remark that Theorem 2.2 can be generalized to the complex case— 
the proof applies to the case of unitary matrices as well. For the complex 
case, there are 2ND constraints and 2D^ degrees of freedom. Therefore, it is 
impossible to recover Vi and V 2 when N < D. Moreover, we suspect that 
recovery is impossible even for TV = D, which would suggest that the sufficient 
condition > D -|- 1 in Theorem |2.2[ a) is also necessary: in fact, it is easy to 
verify the impossibility of recovering Vi and V 2 when N = D = 1. 


2.2. Generalization 

A natural generalization of Q is the following problem: given known matri¬ 
ces Xi,X 2 ,..., Xx-i G R^^^ and unknown orthogonal matrices Vi, V 2 , • ■ •, Vk-i G 
0{D), recover from 

K-l 

Xk=Y. ( 10 ) 

2 = 1 


For this generalized problem, the SDP method is formulated as follows. We 
first homogenize it to 


K 


Y,XiV^^^, 


and let H G Hermitian matrix with the (z,j)—th D x D block 

given by = ViVj'^. Then the SDP method solves 

min tr(C'ff), subject to H F 0 and Hu = I for all 1 < i < K, (11) 
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where Hu represents the (*,*)—th Dx D block. Then we extract the orthogonal 
matrices by the procedure described in Section 

For this generalized problem and its associated SDP approach, we have the 
following theoretical guarantee. 


Theorem 2.3. For generic € 

SDP method recovers exactly. 


IxD 


if N > {K — 2)D + 1, then the 


Theorem 2.2 a) can be considered as a special case of Theorem 2.3 when 
K = 3. However, for K > 3, the condition N > {K — 2)D + 1 is not close to 
optimal. Since (10) has ND constraints and D{D — 1){K — l)/2 variables, the 
information-theoretic limit is N = {D — 1){K — l)/2. Simulations in Section]^ 
also show that the SDP approach empirically recovers the orthogonal matrices 
even when N is smaller than {K — 2)D -\- 1. However, the theoretical guarantee 
in Theorem|2.3|is still more powerful than the least squares approach of solving 


min|.j,r jif-i 


~ which requires N > {K — 1)D -|-1 to 


gRDxn 

K-1 


recover 


3. Numerical Experiments 


In this section, we compare several methods for solving ([^ and (10) on ar¬ 
tificial data sets. The data sets are generated as follows: 


K 1 „„„ 


are n 


matrices with i.i.d standard Gaussian entries A/’(0,1), are random or¬ 

thogonal matrices (according to Haar measure) generated by QR decomposition 
of random matrices with i.i.d standard Gaussian entries, and Xk is generated 
by (@. 

We compare the following five methods: 


1. The SDP relaxation approach (SDP) described in Section]^ 

2. The naive least squares approach (LS): 


K-l 

min - 5] X^V.Wl 

3. Since the convex hull of the set of orthogonal matrices is the set of ma¬ 
trices with operator norm not greater than one, we can strengthen the LS 
approach by constraining its domain (C-LS): 


K-l 

min \\Xk - y X.,V,\\%, subject to IIFiH < 1, 1 < i < iF - 1 

4. This is an approach suggested to us by Afonso Bandeira. Let us start 
with the case K = 3. li V 3 = ViV^, then from (Q, ^ X 1 V 3 + X 2 

and X 3 V^ = Xi + Then we solve the expanded least squares 

problem based on these three equations (LS-I-): 

min \\X3-XiVi-X2V2\\%+\\X3V2^-XiV3-X2\\l+\\X3V,^-Xi-X2V3^ |||. 

Vi,V2,V3 
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Defining 

/ I Fa -Vi \ 

H={ I -V 2 ], 

_yT I J 

the optimization problem can be rewritten as 

min \\{Xi,X 2 ,X 3 )H\\l, subject to//= = I. 

f/gR3£)x3B"" 

In general, for K > 3, this method can be formulated as 

min tr(Cif^), subject to H = and Ha = I for all 1 < i < K, 

i^gRKDXifD 

where Hij represents the ij-th D x D block of H. 

5. The LS+ approach with constraints on the operator norm of Hij (C-LS+): 

min tr(C-H'^), subject to H = , Ha = I and ||-ffyj| < 1 for all 1 < i,j < K. 

To compare the SDP/LS+/C-LS+ approaches, we summarize their objective 
functions and their constraints in Tablej^ There are two main differences. First, 
the objective functions are different. However, since tr{CH) = 0 if and only if 
tr(CiT^) = 0 (considering C 0 and H 0), this difference does not affect 
the property of exact recovery. Second, the constraints of the SDP approach 
are more restrictive than those of the C-LS+ approach {Ha = Hjj = I and 
H Q imply < 1), which is more restrictive than the C-LS approach. 

This observation partially justifies the fact that SDP performs better than C- 
LS+, and C-LS+ performs better than C-LS. However, these interpretations 
do not justify the empirical finding in Figures [T] and that C-LS-I- and SDP 
behave very similarly in the absence of noise. We leave the explanation of this 
observation as an open question. 


Table 1: Comparison between SDP, LS-I- and C-LS-I- approaches. 



objective function 

common constraint 

other constraints 

SDP 

LS-b 

C-LS-b 

tr(Cff) 

tr{CH^) 

trlcH^) 

Hu =l,H = H^ 

iT > 0 

< 1 


Among these optimization approaches, the LS method has an explicit solu¬ 
tion by decomposing it into D sub-problems, where each is a regression problem 
that estimates KD regression parameters. All other methods are convex and 
can be solved by CVX [15] , where the default solver SeDuMi is used [T6| . While 
the LS-|- approach can also be written as a least squares problem with an explicit 
solution, this problem is not decomposable (unlike the LS method). 

When the solution matrices of LS/C-LS are not orthogonal, they are rounded 
to the nearest orthogonal matrices using the approach in |14j . The rounding 
procedure of LS-(-/C-LS-(- is the same as that of the SDP method. 
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Figure 1: The dependence of the mean recovery error (over 50 runs) with respect to N, when 
-D = 10 (left panel) and D = 20 (right panel). The jz-axis represents the mean recovery error 
of Vi in Frobenius norm. 


In the first simulation, we aim to find the size of N such that the orthogonal 
matrices be exactly recovered by the suggested algorithms for K = 3. We let 
I? = 10 or 20 and choose various values for N, and record the mean recovery 
error of Vi (in Frobenius norm) over 50 repeated simulations in Figure]^ The 
performance of LS verifies our theoretical analysis: it recovers the orthogonal 
matrices for N > 2D. LS fails when N < 2D because the null space of [Xi, X 2 ] 
is nontrivial and there are infinite solutions. Besides, LS+ succeeds when N > 
3D 12. SDP and C-LS+ are the best approaches and they succeed when N > 


D + 1, which verifies Theorem 2.2 


In the second simulation, we test the stability of the suggested algorithms 
when K = 3 and the measurement matrix X 3 is contaminated elementwisely 
by Gaussian noise A/’(0,cr^). We use the setting N = 12,16,22, D = 10 and 
cr = 0.01 or 0.1 and record the mean recovery error over 50 runs in Tablej^ which 
shows that the SDP relaxation approach is more stable to noise than competing 
approaches. This motivates our interest in studying the SDP approach. 


Table 2: The mean recovery error over 50 runs in the noisy setting for K = 3 and 
D = 10. _ 


N 

a 

SDP 

C-LS+ 

LS+ 

C-LS 

LS 

12 

0.01 

0.071 

0.076 

0.482 

0.508 

2.260 

16 

0.01 

0.026 

0.031 

0.037 

0.059 

1.926 

22 

0.01 

0.018 

0.021 

0.020 

0.030 

0.077 

12 

0.1 

0.742 

0.742 

1.088 

0.880 

2.341 

16 

0.1 

0.261 

0.328 

0.399 

0.459 

2.034 

22 

0.1 

0.175 

0.217 

0.216 

0.262 

0.834 


In the third simulation, we compare these methods for X = 5 and D = 5, 10. 
























Figure 2: The dependence of the mean recovery error (over 50 runs) with respect to N, when 
iC = 5, D = 5 (left panel) and iC = 5, D = 10 (right panel). The y-axis represents the mean 
recovery error of Vi in Frobenius norm. 


The results are shown in Figure This simulation verifies Theorem |2.3| by 
showing that the SDP approach successfully recovers the orthogonal matrices 
for N > {K — 2)D + 1. Indeed, the empirical performance of the SDP approach 
is even better: it recovers at TV = 12 and 25 respectively, which are 

smaller than {K — 2)D + 1. Compared with LS/LS+/C-LS, the SDP and C- 
LS+ approaches recover the orthogonal matrices with smaller TV. 

At last, we record the running time for all approaches in TableAlthough 
the running time is not the main focus of this paper, and CVX is not optimized 
for the approaches, this table gives a sense of the running times. Tableclearly 
shows that the LS approach is much faster than the other approaches, and the 
SDP approach is consistently faster than C-LS+. We suspect that it is due to 
the fact that SDP has fewer constraints, even though the constraint of SDP is 
more restrictive than that of C-LS+. 


Table 3: The average running time (in seconds) when a — 0.1, K = 2>, D = 10, A = 15 
(first row) and o = 0.1, K = 5, D = 10, A = 35 (second row). 


SDP 

C-LS-f 

LS+ 

C-LS 

LS 

0.64 

0.76 

0.20 

0.48 

0.0005 

5.33 

6.58 

0.62 

0.93 

0.0017 


4. Proofs of main results 


In this section, we first provide the proof for Theorem |2.2[ assuming Theo¬ 


rem 


2.1 and then provide the proofs for Theorems [2T] and [2T3l The main reason 


for this organization is that, given Theorem |2.1| (whose proof is more technical), 
the proof of Theorem |2.2| is rather straightforward. This organization would 
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also emphasize the importance of Theorem 2.1 


in the proof of Theorems 2.2 


which plays an important role 


4-.1. Proof of Theorem \2.^ 


Part (a) follows from the result in part (b) with e = 0, so it is sufficient to 
prove part (b). 

In the proof of part (b), we first claim that it is sufficient to prove the case 
Vi = V 2 = —I, i.e., when X 1 +X 2 + X 3 = 0 and \\Xi — Xi\\p < e for 1 < i < 3, 
then the SDP method recovers Vi and V 2 such that ||\^ + I|| < C-^/e for i = 1, 2. 

This result implies that if the input of the SDP method is (—XiVi, —X 2 V 2 , X 3 ) 
and the output is denoted by (Vi, V 2 ), then \\Vi + I||_f < C^/e for i = 1 , 2 . 

Additionally, it can be verified that if the SDP method outputs (Vi,V 2 ) 
when the input is (Xi, X2, X3), Ui and U2 are two orthogonal matrices, then 
the output for (XiC/i, X 2 C/ 2 , 1 ^ 3 ) would be {TJ^ Vi, Uj V2). Applying this ob¬ 
servation, we have V = Combining it with ||V -|- IH^;’ < C-^e for 

i = 1 , 2 , the theorem is proved. 

The rest of the proof will assume V = V 2 = —I and Xi + X2 -b X3 = 0. 
We represent the noisy setting in (|^ by X, C and H, the clean setting by 
X, C and H, and write the decomposition of H and H hy H — UU^ and 

H^UU^. 


Since U,Uf = I, ||J7|| < Eti = 3, (30) implies 


\\XU\\p - \\XU\\p < ||(X - X)U\\p < 3|1X - < 9e (12) 


and following the same argument, 

\\XU\\p<9e+\\XU\\p. 


(13) 


Since H = UU^ is the minimizer of the SDP problem, we have ||X{7 ||f < 
||XJ7||_f. Combining it with ||XL1 ||f = 0 and (13), 

||XC7||i. < 9e. (14) 

Combining (12), (14) with Theorem |2.1[ 

9e > \\XU\\p > \\XU\\p - 9e > c(X)||i7^PL ^\\% - 9e, (15) 

so we have 

\U^P,±\\p< (16) 


18e 


Combining it with Lemma 4.2 
\\U,Uj - I|if =\\U,-U,\\p = 1117^(1,0,-1)II 

‘ 18e 


<||C/'PLbl!F||PA(I^O>-I)ll <2i 


z{X)- 
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Since the post-processing step f{Z) is a continuous and differentiable func¬ 
tion with respect to Z and /(—I) = —I, the difference between the recovered 
orthogonal matrix Vi and Vi = —I, HVi -I-I|1 f, is bounded above by C^/e when 
c{X) > 0. Similarly we have ||t^ -I- I||f < C^/e. 


4-2. Proof of Theorem \2.1\ 

Proof. The main idea of the proof is to investigate U* , which is defined to be 
nearest matrix to U in the set U defined in Then we represent U* in the 
form of (21), and show that ||T||f is bounded by Ci\\XU\\p, for some Ci > 0. 
With additional bounds ||-ft:yi||F, \\KY 2 \\f < C 2 yJ\\XU\\F and 


U*^P. 


If < 


{KY,y 


{LY2V 

{KY 2 V 


we will show that \\U*^ 

By analyzing the properties of U 
If, and the theorem is proved. 


l^IIF is bounded above by a function of 


in (24). 


XU\\f 

the same statement holds for 


\UP, 


We first remark that it is sufficient to prove the case N = D + 1. If this is 
true, then ioi N > D + 1, holds when X is replaced by X' the submatrix 
consists of the first D + 1 rows of X. Since ||X[/|jF > ll-^^t^l|F, (|^ is proved. 
Therefore, for the rest of the proof, we assume N = D + 1. 

We first define the following: 


U = {U & 
U^{ij € 


^3Dxk . ^ 

l3Dxk .XU 


(17) 


and in Us and C /3 represent the submatrix consists of the last D rows of 
U and U. We also define the distances between two matrices and the distances 
between a matrix and a set by 


dist(C/, U') = \\U - C/'IIf, dist(C/,Cf) = min \\U - U'\\f. 

U' ^lA 


Then we have 


dist(C/,W) > Cdist(C/,W), 


(18) 


for (7 = 1/2. The proof of ( |I^ is deferred to Section 4.2.1 


Assuming that (Tniin(A() is the smallest singular value of X, and for any 
matrix A G Sp(A) represents the subspace spanned by the row vectors 

of A in K”, then we have 


||XC/||f > a^i„(X)||Ps;(^)C/||F = 


i(X)dist(C/,W) > Ca^in(X)dist(U,A), 

(19) 


where the last inequality follows from (18). 
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Assuming that U* = arg min^g^ dist(f/, [/), then using I C/3 1 e // we 


C/3 


have 


dist(C/*,C/) < dist ( ( C/3 ) ,C/ | < 
C/3 


( 20 ) 


Now let us investigate dist(C/*,C/) further. If Yi and I2 S 
are chosen such that [Xi,X2][yi,l2]^ = 0, then using Xi + X 2 + X^ = 0, 
there exist L e K e orthogonal matrix 

such that 


/ I + iYl KYi \ 

U*=\ I + LY 2 KY 2 C/;(. (21) 

VI 0 y 

That is, if we write U* = \ C/| 1 , then 

V ui) 

ill = (I + TYi,iiTYi) c/', ij* = (I + LY2,KY2) c/', ij; = c/3 

(C/3 = c/3 follows from dehnition of U). Since for any Ui S with SVD 

decomposition Ui = C/jy^Sj/^Vj^, the closest orthogonal matrix in is 

given by Uui , so the distance between U* and U is 


dist(C/*,C/) 


\ 


El 




III 


2 

F- 


Applying (20), all singular values of J7i, U 2 and Us (i.e., all diagonal entries of 
smaller than y/SD + 1. Let C' = VSD + 2, then dist(J7,Z^) can 
be controlled as follows: 


3 


C"^dist2(C/,W) = C"2dist^(C/*,C/) > C'^ dist^{U*,U) = - 


^Ei 


sy-iii^ = 


\U*U*^ -1\ 


\u;uF -i\\ 


+ \\u;m^ - 


= \\LYi+Y^^F - Y^^{FL + K)Yi\\% 

+ \\LY2 + Y^^F -Y2^{FL + K^K)Y2\\1 
>{mm{0,X^i^{LY, + Y,^L^))f + X^i^{LY2 + Y^^L^)))^, 


\% 


III 


2 

F 


( 22 ) 

(23) 


where the last inequality follows from the observation that Y^ {F' L+K^K)Yi 
and Y 2 ^{FL + K^K)Y 2 are positive semidefinite, and for any symmetric 
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matrix X, the distance to the nearest positive semidefinite matrix (in Forbenius 
norm) is at least — min(0, Amin (^))- 
Then (19) and (23) imply 


An 




,{LY2 + Y^' L')>- 


C'wxuy 

Camin(X) ■ 


Gamin(X) ’ 

Now we introduce an important lemma. 

Lemma 4.1. For D > M, L G and Y,Z G if X^i^iLY + 

Y^L^) > —e and X^in{LZ + ) > —e, where Amin represents the smallest 

eigenvalue, then ||1/||f < ec(l^, Z), where c{Y, Z) > 0 for generic Y and Z. 

The proof of Lemma |4.1| is rather technical and deferred to Section |4.4[ 


Applying Lemma 4.1 for generic Yi and Y 2 (and as a result, for generic Xi and 
X 2 ) there exists Gi depending on Yi and I2 such that ||L||f < C'i||XC/||i;’. In 
addition, we have 

||LYi + Y^L^ - Y^^iL^L + K^K)Yi\\f 

_ ||Y-T^T_^y^|j^ _ ii^y^ ^ Y.^L^Wf, 

and WY^^K^KYiWf > :^tr{Y^KYi) = ^\\KYi\\f,. As a result, there 
exists C 2 depending on Yi, 1^, X such that if ||XYi IIF > C2x/WW^ then 

C'\\XU\\f 


\LYi+YfL' -Yf {L' L + K' K)Y4 f> 


Gamin (X) 


which violates (22) and (19). Therefore, by contradiction we proved ||XYi||f < 
G2-\/||XI7||f, and similarly, ||lirY2||F < G2 \/||XC/||f- Combining it with 
||L||f < GiIIXLTIIf, we have 




= IIG*^Pt 


< 


(LY2)T 

{KY2y 


(GYi)T 

{KY,y 

< \\LY,\\f 


{LY 2 V 

iKY2V 


|LY2||F + ||/fYi| 


IKY. 


2\\F 


(LY.y 
{KY,y 

<||L||f(||Yi|| + IIY 2 II) + IIXYiIIf + ||XY2||f 

<(||Yi|| + |1Y2 ||)Gi||XI7|1f + 2G2v/||XI7||f. 

Recall that Gi and C 2 only depends on Xi and X 2 {X, Yi, and I2 are generated 
from Xi and X2), combining ( |l^ and (24) we have 

f + \\{U-U*)^Pf^\\f<\\U*^J 

1 


IG^P.^IIf < ||G*^Pr 


LjLIIP’ ^ 

<(||Yi|| + ||Y2 ||)Gi||XI7||f + 2C2x/||XI7||f 


(24) 


+ ||G- G* 


Gamin(A:) 


IATGIi 


Considering that ||G|| < 3 and ||XG||f < I1G||||X||f < 3||X||f, so if we let 
G4 = 3||X||f, then 


IG^P, 


v||f< (^(||Yi|| + ||Y2||)Giv^ + 2C'2 + ^^-^v^) V\\xuy, 
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and Theorem 2.1 is proved. 


4 . 2 . 1 . Proof of (|^ 

Suppose that U* = arg min^g^ dist(i7, U) and 


□ 


- ( 

U-U* = \ V 2 

V V3 


where Vi S for i = 1, 2,3, 


then 

[/ = + I ^3 1 

would satisfies that IJ £14, and as a result, 

dist(i7,W) =dist{U,U*) = ^||Vi||^ +11^211?.+ IIV3III > C^JwV + Vfp + m + VsW 
=Cdist{U,U) > Cdist{U,U), 

where C can chosen to be 1/2. 


4 . 3 . Proof of Theorem \2.!^ 

Proof. We start with the same argument as in the proof of Theorem |2.2| and 
assume Vi = —I for all 1 < i < if — 1 and = 0. Then the proof 

can be divided into three steps. First, we show that it is sufficient to prove 
that a property defined in (26) is satisfied for generic Y £ where p = 

max ((if — l)il — TV, 0). Then we establish that (26) indeed holds for generic Y. 
To this end, secondly, we show that any Y that does not satisfy this property 
lies in a certain set. Finally, in the third step, we show that this set is of measure 
zero. 


Xk = 0 , 


4 . 3 . 1 . Step 1: reduction of the problem to the property (26) 

By the assumption that V/ = —I for all 1 < i < if — 1, Xi + . 
and 

tr(//(I,..., I)T(I,..., I)) = tr((I,..., I)if(I,..., I)t) = IIX 1 +.. .+Xk\\1 = 0. 


Considering that tr(iTC') > 0 for any ff ^ 0, if the solution to the SDP 
problem is not uniquely given by (I,..., I)^ (!,...,!), then there exists H yf 
(I,..., I)T(I,..., I) such that ti{CH) = 0. Let H = UU^ for a matrix U £ 
T^KD-Kk^ then using the properties of H, we have U £14 ior 


U= lU = 



; Ui £ J7,J7/ = I,VI < i < K,Pf^U 0 
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Li defined by 


Li = {z G : 2; = (x, X, ... ,x) for some x G K^}, 

and U ^ 0 means that Ui,U2, - ■ ■ , Uk are not all the same. 

Since tr{CH) = 0, we have ||Xi7||i;’ = 0. Let Sp(A) and Col(A) be the 
subspaces spanned by the row vectors of A and the column vectors of A respec¬ 
tively, then 


Sp(X)-^ D Col(J7). 


(25) 


For any two subspace L and L', we use L -|- L' to represent the subspace 
{x + y : X G L,y G L'}, then we claim that to prove Theorem |2.3[ it is sufficient 
to prove the following statement with p = max ((K — 1)D — N, 0): 


For generic Y G 


KD 


Co\{U) % Sp(F) -f Li for every U GU. (26) 


The argument is as follows. Since = 0, 

dim(Sp(X)) = rank(X) = rank([Xi, X2, • • • ,Xk-i,Xk]) 
= rank([Xi,X2 ,--- ,Xk-i]), 


which is min {{K — 1)D, N) for generic G 


IxD 

± 


so Sp(X) is a generic 


min {{K — 1)D, 7V)-dimensional subspace in L^*- (and is a subspace of K^^). 

So, Sp(X)'’‘ is the sum of Li and a generic p-dimensional subspace in . 
Therefore, for generic {X}^^, Sp(X)-*- is equivalent to Sp(l^) -f Li, where Y 
is a generic matrix of size p x KD. If (26) holds, then ( |2^ would not hold for 
generic {X}fj[^ and every U GU. By the analysis before ( |^ , the solution to 
the SDP problem is uniquely given by (I,..., 1)^(1,..., I), and Theorem 
proved. 


2.3 


IS 


4-3.2. Step 2: finding matrices that do not satisfy (26) 


In this part we show that every Y violating (26) lies in the set 
where is the range of the function 

9d {{Y,}l-,\Z,L,Lo) = ,YkP^J + iYo,--- ,^ 0 )] +Pi^xZ. 


The domain of the function gd is as follows: Yi G for 1 < z < AT — 1, 

lo G Z G Lq is a d-dimensional subspace in , and L is a 


d-dimensional subspace in Rp. In addition, Yk = — 


For every Y violating (26), there exists U GU such that Col(I7) C Sp(Y')-|- 
Li. We let U = where the columns of lie in Li and the 

columns of are orthogonal to Li. As a result, Col(f7*^^^) is a subspace in 
that intersects Li only at origin, and we let d = dim(Col(C/*'^^)). We have 
d > I, since otherwise is a zero matrix, and P7±_U = PA = 0, which 

L/1 

contradicts U GU. 
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Denote 


= 




V Uk j 


and (7(2) = 


/ (7f) \ 


V J 


then U[^'^ = = ■■■ = ul^\ Recall for all 1 < i + 


'K 

^i2)u{2)T = [/.{JT ^ 


Therefore, 


and 


rr(2)r7-(2)T _ jr r(2) *■ r(2) T _ _ »-r{2) »■ r(2) T 

—^2 ^2 ~"'~^K^K 


Col (c/f)) = Col (uP) = ... = Col (C/^^)) , 


(27) 


dim ^Col((7|^))^ = rank < rank = dim ^Col(J7(2))^ = d. 

Since Go\{U) = Col((7(2)) + Col((7(i)), and Col((7(i)) C Li, the assump¬ 
tion Sp(Y') -I- Li D Col(i7) is equivalent to Sp(Y') -I- Li D Col((7(2)). Recall 
dim(Sp(Y)) < p, and Col(i7(2)) is a subspace in that intersects Li only 
at origin, we have d < p. 

Apply Sp(Y') + Li A Col((7(2)) ^nd dim (Col(i7(2))^ = d, there exists Yq G 
I^dxD g d-dimensional subspace in such that 

Sp (PT [Y - (Yo, Yo,... , Yo)]) = Col ((7(2)) . 


Recall the property (27), there exists Y) G 


for 1 < i < AT and Ln, a 


d-dimensional subspace in that contains Sp((7(^)^), such that 

PT [Y - (Yo, Yo, • • • , Yo)] = (YiPT , Y^PT, • • • , Y^P^) . (28) 

In addition, since Col(t/^^^) is orthogonal to Li. 


K-l 


yk = -J2 y- 


(29) 


i=l 


Combining (28), (29), and the estimation 1 < d < p, every Y that does not 
satisfy (26) lies in the set 


4-3.3. Step 3: counting the dimension of Aid 

In this step, we count the dimensions of Aid for all 1 < d < p, and show 
that they are smaller than pKD, the dimension of which implies that 

generic Y S does not belong to U(j^^Add, and ( |^ is proved. 

For any d, the degree of freedom is {Yi\f^^^ is (AT — l)d2, the degree of 
freedom of Yq is Dd, the degree of freedom of Z is KD{p — d), the degree of 
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freedom of L and Li are d{p — d) and d{D — d) respectively. Considering that 
d < p < D — 1, the total dimension of Md is smaller than KDp, the dimension 
of Since gd is smooth and the dimension of its range is larger than its 

domain, all elements in Aid are its critical values. Applying [171 Theorem 6.8], 
Aid has measure zero in Therefore, \ Aid is dense. 

Since \ Aid is a closed set, generic Y does not lie in the set Aid- 

Combining this result for all 1 < d < p, generic Y does not lie in the set 

uLi-^d- n 


4-4- Proof of Lemma \4-1\ 

We first state two lemmas that are rather easy to verify. 

Lemma 4.2. When A G B G where I > n and the singular values 

of B are (Ji > a 2 >■■■> an, then 


o’nllA.jli;’ < ||Ai?||i? < (Ti||A||f. (30) 

Proof. First we claim that for any x G M", 

al\\xr<\\x^Br<a!M^. (31) 

Assuming that the SVD decomposition of B is given hy B = Ub diag(tTi, • • • , cr„) V^, 
then the first inequality in (31) can be proved as follows: 

Wx^Bf = x^BB^x = x^Ub diag(a^ • • • , al)Ulx > al\\x^UBf = al\\xf, 


and second inequality in (311 can be proved similarly. 


Assuming that A = (ai,..., a^), and combining (311 with a: = a^ for 1 < 
f < TO, (301 is proved. □ 

Lemma 4.3. The smallest eigenvalue of 

a b 
b 0 


is smaller than 

max(a, 0) + jbj ’ 

Proof. The smaller eigenvalue is 

a — V a'^ + 45^ 2 b^ 26^ b^ 

2 a + ^/a?A^^~ a + (|a|+2|b|) max(a, 0) + |6| ’ 

□ 


Proof of Lemma 4-1 First of all, WLOG we may assume that 

Y — ( Imxm 
OmxD-M 


(32) 
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If we proved the case (32), then other cases can be proved as follows. For generic 
Y, there are invertible matrices B S ^mxm ^ g such that 


BY A = 


ImxM 

OmxD-M 


Note that 

{A^ LB-^){BYA) + {BYA)^iA^LB-^)^ = A^{LY + Y^L^)A, 
we have 


\^in{{ALB-^)iBYA) + (BYAfiALB^Y) > "ll^fe, 

and similarly 

X^YALBYiBZA) + {BZA)^{ALB~Y) > -|l^f £• 


Since the case (32) is assumed to be proved, 

\\ALB-^\\f < e\\Afc{BYA,BZA). 


Applying Lemma 4.2 \\ALB~^\\f > ||£||FCTmin(A)/||B|j, and the generic case 
is proved with c(l^, Z) = Y^~j^c{BYA, BZA)^ which is positive for generic 
Z since c{BYA,BZA) is positive for generic BZA. 

The rest of the proof will assume ( |3^ and use induction on k, which is the 
integer such that k{D — M) < M < {k + 1){D — M). For fc = 1, let us denote 


L = 


L2 


, Z — {Zi, Z2), 


where Li, Zi G L 2 G g ^(d-m)xm_ 

LY + Y^L^=i ^ 1 +^^ 

i/2 ^ 


If 11X211 = b and ||Xi|i = a, then there exists u G 
l!'*^ll = ll'*^ll = 1 L2V = b. Then 


, V G ^ such that 


u 0 
0 V 


{LY + Y' L') 


u 0 

0 V 


uJ {LI + LJ )u b 
b 0 


/ 

^ u 

0 / 

U 

0 \ 

e since 1 

, 0 

V ) ( 

0 

V ) 


= I. 


Applying Lemma 4.3 with the estimation {Li + Lj )u < 2 a, we have 

b‘^<e( 2 a + b). (33) 
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In another aspect, applying Lemma 4.2 we have 

For generic Z, CiUiilji? > \\LiZ 2 \\f > C'2 ||-^i||f for some Ci,C'2 > 0. 


Therefore, we can find u G and v G such that \u^{LiZ 2 )v\ > C 2 a. 

Note that 


w 0 Y _ { u^{L,Z, + ZjLj)u u^(ZjLj)v\ 
0 ^ 0 V ) - 1 ^ u^{L^Z2)v 0 


where |jil'|j < C36. Since \u' {LiZi + Z{ L{ )u 
Lemma 4.3 shows that the smallest eigenvalue of 


< C^a for some (74 > 0, 


( u^{LiZi + ZjLl)u 
vJ{LiZ2)v 


u^{ZjLj)v \ 

0 ) 


is smaller than 

(74 + (7i“' 

T T 

Let C'^ = applying X^^iniLZ + Z ' L' )> —e we have 


Csb - C'^a > -e. 


(34) 


This means a < ^{C^b + e). Plug it into (33), we have 


b^ < e{b + 2a) < e^b + + e)) — 


which implies that b < Ce. Applying (34), we also have a < C'e and the Lemma 
is proved for k = 1. 

For A: > 1, let us first write 


L = 


i'1,1 i'1,2 
^<2,1 L 2,2 


where Li,i G rMx{2M-d)^ ^ ^mx(d-m)^ ^ 

j-M)x'{D-M) similarly write 


-M)x(2M-D) T 


Z = 


■^1,1 -^1,2 

■^2,1 -^2,2 


where Z^x G mP-m-d)xm ^ 2^2 e ]R(-d-m)xm^ 22,x £ K(2M-r?)x(D-M)^ 22,2 G 
]^(D-M)x(n-M)_ yvLOG we may assume that Zi^2 = 0, by finding an appro¬ 
priate orthogonal matrix A G and consider {LA', AY, AY) instead of 

{L,Y,Z), then 

^1.1 0 

■^2,1 ^ 2,2 
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and for generic Z, ^ 2,2 is full-rank, i.e., the rank is D—M. Let b = |j(L 2 ,i, -^ 2 , 2 )||f, 
oi = and 02 = ||-Li. 2 ||f- Then following the proof of ( 33 l, applying 

Amin(TY -I- L^) > —e we have 


di 0,2 b 


< e. 


( 35 ) 


If another aspect, 


LZ+Z^L-^=( F,,Zu + L,,,Z„ + Zi,,L,,, + Z,,,L,,, L^,,Z,,, 

^2,2-^1.2 ^ 


where ||X'|| < Cib. Again Lemma | 4 ^ means that we have (7202 < ||ili^ 2 ‘ 2 l 2,2 
C^a2 and \\Li^iZi i-\-Li 2Z2^i-\-Z-y -^^IjJi-\-Zi 2l^^^i\\ ^ ( 74 ( 01 + 02 ). Lemma 
then implies 


< 


4.3 




<e + Cib. 


( 36 ) 


(74(01 + 02) + (73O2 

At last, note that ii.iZi^i + Li^ 2 Z 2 ,i + + Z2^2.^J,i is a submatrix 

of LZ + Z^, so its smallest eigenvalue is also larger than —e. Since there 
exists (75 such that ||Li_2.Z2,i + Zj^Lj^iW < C^a 2 , the smallest eigenvalue of 
Ti^iZi^i + Zj-yLji is larger than —e — C^a 2 - Let 

Y _ ( I( 2 M-F)x( 2 M-F) 

\ 0 ( 2 M-F)x(D-M) 

and using the same argument, the smallest eigenvalue of ii^Zi^i + ZjiLj^ is 
larger than —e — C'^a 2 - Since (k — 1){D — M) < 2M — D < k{D — M), we may 
apply the case fc — 1 to Li.i, Yip and Zip and have 


oi < Cq{{C5 + (75)02 + e). 


Plug in ( 37 ) to ( 351 and ( 36 ) we have 


and 


b^ < eC^i^e + 02 + 6) 


02 < Cs{e + 02 )(e + 6 ), i.e., 02(02 - C^b) < eCs{e + 02 + 6 ). 


( 37 ) 


( 38 ) 


( 39 ) 


If 02 > 2Csb, then ( 39 ) implies 


-02 < eCs{e + 02 + ^^“2), 

which implies 02 < Ce. This then implies 6 < C'e (from assumption) and 
oi < ( 7 "e (from ([^). 

If 02 < 2Csb, then ( 1 ^ implies 6 < Ce, which then implies 01,02 < C'e. 


For either case we have oi + 02 + 6 < ( 7 '"e, and Lemma 4.1 is proved since 


IF + 


< \\L 


IPIIF ■ 


^1,2||F ■ 


^2P, J^2,2l 


= O + 6 + C. 


□ 
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